The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Besides the complex nature of colonoscopy frames with intrinsic frame formation artefacts such as light reflections and the diversity of polyp types/shapes, the publicly available polyp segmentation training datasets are limited, small and imbalanced. In this case, the automated polyp segmentation using a deep neural network remains an open challenge due to the overfitting of training on small datasets. We proposed a simple yet effective polyp segmentation pipeline that couples the segmentation (FCN) and classification (CNN) tasks. We find the effectiveness of interactive weight transfer between dense and coarse vision tasks that mitigates the overfitting in learning. And It motivates us to design a new training scheme within our segmentation pipeline. Our method is evaluated on CVC-EndoSceneStill and Kvasir-SEG datasets. It achieves 4.34% and 5.70% Polyp-IoU improvements compared to the state-of-the-art methods on the EndoSceneStill and Kvasir-SEG datasets, respectively.
translated by 谷歌翻译
Federated learning (FL) enables the building of robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package, and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) and apply them in real-world FL settings. This paper introduces the key design principles of FLARE and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms. Code is available at https://github.com/NVIDIA/NVFlare.
translated by 谷歌翻译
在医学图像分析中需要进行几次学习的能力是对支持图像数据的有效利用,该数据被标记为对新类进行分类或细分新类,该任务否则需要更多的培训图像和专家注释。这项工作描述了一种完全3D原型的几种分段算法,因此,训练有素的网络可以有效地适应培训中缺乏的临床有趣结构,仅使用来自不同研究所的几个标记图像。首先,为了弥补机构在新型类别的情节适应中的广泛认识的空间变异性,新型的空间注册机制被整合到原型学习中,由分割头和空间对齐模块组成。其次,为了帮助训练观察到的不完美比对,提出了支持掩模调节模块,以进一步利用支持图像中可用的注释。使用589个骨盆T2加权MR图像的数据集分割了八个对介入计划的解剖结构的应用,该实验是针对介入八个机构的八个解剖结构的应用。结果证明了3D公式中的每种,空间登记和支持掩模条件的功效,所有这些条件都独立或集体地做出了积极的贡献。与先前提出的2D替代方案相比,不管支持数据来自相同还是不同的机构,都具有统计学意义的少量分割性能。
translated by 谷歌翻译
在这项工作中,我们考虑了成对的跨模式图像注册的任务,这可能会受益于仅利用培训时间可用的其他图像,而这些图像从与注册的图像不同。例如,我们专注于对准主体内的多参数磁共振(MPMR)图像,在T2加权(T2W)扫描和具有高B值(DWI $ _ {high-b} $)的T2加权(T2W)扫描和扩散加权扫描之间。为了在MPMR图像中应用局部性肿瘤,由于相应的功能的可用性,因此认为具有零B值(DWI $ _ {B = 0} $)的扩散扫描被认为更易于注册到T2W。我们使用仅训练成像模态DWI $ _ {b = 0} $从特权模式算法中提出了学习,以支持具有挑战性的多模式注册问题。我们根据356名前列腺癌患者的369组3D多参数MRI图像提出了实验结果图像对,与注册前7.96毫米相比。结果还表明,与经典的迭代算法和其他具有/没有其他方式的经典基于测试的基于学习的方法相比,提出的基于学习的注册网络具有可比或更高准确性的有效注册。这些比较的算法也未能在此具有挑战性的应用中产生DWI $ _ {High-B} $和T2W之间的任何明显改进的对齐。
translated by 谷歌翻译
掌握姿势估计是机器人与现实世界互动的重要问题。但是,大多数现有方法需要事先可用的精确3D对象模型或大量的培训注释。为了避免这些问题,我们提出了transrasp,一种类别级别的rasp姿势估计方法,该方法通过仅标记一个对象实例来预测一类对象的掌握姿势。具体而言,我们根据其形状对应关系进行掌握姿势转移,并提出一个掌握姿势细化模块,以进一步微调抓地力姿势,以确保成功的掌握。实验证明了我们方法对通过转移的抓握姿势实现高质量抓地力的有效性。我们的代码可在https://github.com/yanjh97/transgrasp上找到。
translated by 谷歌翻译
深度神经网络的兴起为优化推荐系统提供了重要的驱动力。但是,推荐系统的成功在于精致的建筑制造,因此呼吁神经建筑搜索(NAS)进一步改善其建模。我们提出了NASREC,它是一种训练单个超级网的范式,并通过重量共享有效地产生丰富的模型/子构造。为了克服数据多模式和体系结构异质性挑战,NASREC建立了一个大型的超级网(即搜索空间),以搜索完整的体系结构,而SuperNet结合了多功能操作员的选择和密集的连接性选择,并使人类的密集连接性最小化。 Nasrec的规模和异质性在搜索中构成了挑战,例如训练效率低下,操作员不平衡和降级等级相关性。我们通过提出单操作员任何连接采样,操作员平衡互动模块和训练后微调来应对这些挑战。我们对三个点击率(CTR)预测基准测试的结果表明,NASREC可以胜过手动设计的模型和现有的NAS方法,从而实现最先进的性能。
translated by 谷歌翻译
最近对压缩的最新研究表明,失真和感知质量相互矛盾,这使失真和感知之间的权衡取舍(D-P)。直观地,要获得不同的感知质量,必须培训不同的解码器。在本文中,我们提出了一个非平凡的发现,即只有两个解码器足以实现任意(无限数量的不同)D-P取舍。我们证明,通过最小MSE解码器的输出和专门构建的完美感知解码器之间的简单线性插值可以实现D-P权衡约束的任意点。同时,可以通过插值因子定量控制感知质量(就平方的Wasserstein-2距离度量而言)。此外,为了构建一个完美的感知解码器,我们提出了两个理论上最佳的培训框架。新框架不同于基于扭曲 - 对抗损失的启发式框架,广泛用于现有方法,这些框架不仅在理论上是最佳的,而且可以在实践感知解码中产生最先进的性能。最后,我们验证了我们的理论发现,并通过实验证明了框架的优势。代码可在以下网址找到:https://github.com/zeyuyan/controllable-cocceptual-compression
translated by 谷歌翻译
组合来自多视图图像的信息对于提高自动化方法的疾病诊断方法的性能和鲁棒性至关重要。但是,由于多视图图像的非对齐特性,跨视图的构建相关性和数据融合在很大程度上仍然是一个开放的问题。在这项研究中,我们提出了输血,这是一种基于变压器的体系结构,可使用卷积层和强大的注意机制合并不同的多视图成像信息。特别是,针对丰富的跨视图上下文建模和语义依赖性挖掘,提出了发散的融合注意(DIFA)模块,以解决从不同图像视图中捕获未对齐数据之间的长期相关性的关键问题。我们进一步提出了多尺度注意(MSA),以收集多尺度特征表示的全局对应关系。我们评估了心脏MRI(M \&MS-2)挑战队列中多疾病,多视图\&多中心右心室分段的输血。输血表明了针对最先进方法的领先绩效,并为多视图成像集成的新观点打开了稳健的医学图像分割。
translated by 谷歌翻译